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Abstract 

We investigate absolute bounds (or inequalities) on the mean and standard deviation of transformed data 
values, given only a few statistics on the original set of data values. Our work applies primarily to 
transformation functions whose derivatives are constant-sign for a positive range (e.g. logarithm, antilog, 
square root, and reciprocal). With such functions we can often get reasonably tight absolute bounds, so that 
distributional assumptions about the data needed for confidence intervals can be eliminated. We investigate a 
variety of methods for obtaining such bounds, first examining bounding curves which are straight lines, then 
those that arc quadratic polynomials. While the problem of finding the best quadratic bound is an 
optimization problem with no closed-form solution, we display a variety of closcd-fonn quadratic bounds 
which can come close to the optimal solution. We emphasize what can be done with prior knowledge of the 
mean and standard deviation of the untransfonned data values, but do address some other statistics too. 
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1 , Introduction 

Standard transformations of numeric data values such as logarithm, antilog, square root, square, cube, and 
reciprocal are frequently appropriate as a prelude to statistical analysis of finite data sets [7]. Sometimes, 
however, the daUa are already aggregated into counts and means, and tlic original data values lost. 'Ihis 
happens when tlie original daUa is too large to handle and/or contains sensitive information, as the 
U. S.^ Census, which publishes much of its data as aggregates. We may also deliberately create ’’database 
abstracts” of aggregate statistics to facilitate quick statistical estimates by ’’antisampling” methods [10]. 
Statistics on the transformed values cannot be calculated uniquely when die original data is so preaggregated^. 
But if we arc doing exploratory data analysis [13, 6], an estimate of a statistic on tlic transformed data may be 
all that we need. We address one set of methods for obtaining such estimates, by finding absolute 
(unconditionally guaranteed) bounds on the mean and standard deviation for data under some common 
transfonnations. 

Absolute bounds arc the only tme ’’nonparainctric” fonn of estimate, and as such have advantages. 
Compared to ’’rcasonablc-giicss” estimates [9], biasedness of the estimator need not be dealt with, while at tlie 
same time providing numbers close to tlie true answer for this category of problems. As [7] discusses, 
confidence intervals for the mean and standard deviation of transformed data arc difficult to obtain and 
methods arc subject to exceptions, and thus absolute bounds easily obtained arc appealing, 'fight enough 
absolute bounds can be equivalent to a good estimate. An estimate of a statistic can also be logically incorrect 
when bounds arc tight, i.e. it may not be a statistic of any possible distribution consistent witli the constraints. 
Bounds arc useful for other reasons as well. Some algorithms exploit only bounds, as tlic ’’branch and 
bound” mctliods of [4] for retrieval of infonnation from a database. Other advantages we have investigated in 
previous work [10, 11, 12]. In addition, the matlicmatics of absolute bounds is straightforward and requires 
only elementary calculus. 

Our approach is to give a variety of botmds formulae for the same estimation situation. In general, we do 
not know which of several bounding methods will be the best for a problem, and this suggests die program 
architecture of an artificial-intelligence ’’production system” [1]. We can combine results by taking die 
minimum of all die upper bounds, and the maximum of all die lower bounds. 



I^'C^ if ihc dnin is traiisfomicd before being nggregnled, there are still many reasons to want statistics on the tmiransformed data. To 
use the example of [7], it is useful to study rainfall in the cube lOOt of inches, but one may tl'.cn be interested in statistics on liio eubc of 
that, the meaningful quaniily of total volume. 
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2. Our approach 

In this work wc examine transformation functions whose derivatives have a constant sign in the interval of 
study. (We may be able to relax this restriction in particular cases, however; usually only a constant-sign 
second derivative is necessary. Chapter 3 of[5] discusses detailed restrictions, in particular die notion of 
function convexity, for tlie material we cover in section 3 below.) The so-called "power transformations" and 
their inverses [2] satisfy this constant-sign restriction for positive data values. Six common power 
transformations are log, antilog, square root, square, cube, and reciprocal, and these will be our primary 
examples. Logarithm is particularly important because die mean of die logs is the log of the geometric mean 
of a set of data values; reciprocal is also important because it provides the key to handling quotients of 
random variables. To summarize die six example transformations; 



Function 


first deriv. 


second deriv. 


steepest point 


ln(x) 


+ 


- 


leftside 


e" 


4- 


+ 


right side 


7x 


+ 


- 


left side 


x^ 


+ 


4- 


right side 


x^ 


-f 


4- 


right side 


1/x 


- 


4- 


left side 



We shall assume the following statistics on die original (untransfonned) data values arc known: 

• p., die mean of the values (or equivalently, the sum of the values and the number of values) 

• m, the minimum of die values 

• M, the maximum of the values 

Even when we do not know die minimum and maximum exactly, wc can often assume extreme "safe" values 
which the minimum cannot be less than and the maximum cannot be greater dian, and which wc can use in 
our formulae. So it is reasonable to believe wc can always come up with a minimum and maximum for a set 
of values. 

In much of what follows wc also assume the following is known: 

• a, the standard deviation of die values - defined as S^^j^^(x.-p)Vn, instead of the more 
conventional formula with a denominator of n-1 

Note wc use the symbols /i and o to emphasize that we arc consider finite data populaiions, which are not 
necessarily samples of anything. 



We shall ignore linear transfoimations of variables as a preliminary to applying power functions, since 
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tliese can be handled trivially, b'or instance, lllx) — ln(ax + b) can be analyzed by defining y-ax + b and 
analyzing g(y) = ln(y), where = a/x^^ + b and - aa^^. 

Our basic idea is to find functions that arc (a) entirely above, and (b) entirely below tlic curve of the 
function on the data-value interval. We shall consider two important cases: bounding curves that arc straight 
lines (sections 3 and 4) and bounding curves that arc second-degree polynomials (quadratics) (sections 5, 6, 7, 
and 8). Subsequent sections consider extensions to this framework: use of subset means and standard 
deviations in, section 9, use of order statistics in section 10, use of distribution fits in 11, and adjustments for 
small populations in 12. We conclude with some simple test experiments in section 13. 

3. Linear bounds on the mean 

3.1 . Overview 

For straight lines, one curve can be a tangent to the curve at some point (for convenience, the mean); tlie 

other a secant of the curve through it at the minimum and the maximum. For curves with negative second 

derivative like logaritlim and square root, the tangent is an upper bound, the secant a lower bound; for curves 

with positive second derivative like antilog and reciprocal, the tangent is the lower bound and the secant the 

upper. These bounding lines map directly into bounds on the mean and standard deviation, for note if ax + b 

> fix) for all X in a range, f some transfoiTnation functions satisfying our restrictions, and E denoting 

expected value, then 

E(ax + b) > Ififix)), or 
aE(x) + b > F'(fix)), or 
a/Li + b > E(f(x)) 

E(fix)) being the quantity we are interested in bounding. 

3.2. Linear bounds on the mean 

Let us apply tlicse ideas to the mean of transformed values (see figure 3-1). The tangent to fix) at /j, has 
equation 
y = 

I his leads to a well-known bound (generalized in [5], p. 70): 

On tlic other side of the curve, die secant through lire maximum and minimum forms a bound. This line has 
equation 

y = x * [(f(M)-fim))/(M-m)| + [fim) - m * [(f{M)-fim))/(M-m)]] 
which corresponds to the bound 
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Figure 3-1: Linear bounds on tlie mean of transformed values 
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fi * [(fl[M)-fi:m))/(M-in)] + [f(m) - in * ((f(M)-n:in))/(M-m)ll 
= (jii-m) * |(f(M)-f(m))/(M-m)] + f(tn) 

= (l-a)fl[m) + = f(m) + a(f(M)-f(m)) 

where a = (/i-m)/(M-m) 

To give an example, if a set of data values ranges from 10 to 100, and the mean is 23, the mean of the 

logarithms of tlic data values has 

an upper bound of ln(23) = 3.135 
a lower bound of77/90 hi(lO) + 13/90 In(lOO) = 2.635 

Hence the geometric mean of the original data values is between e^ ®^^ = 13.9 and e^'^^^=23. In general 
from these formulae, the geometric mean is between /j. and m(M/m)“; and the harmonic mean is between /x 
and l/[l/m + 1/M -/^/mM], 

3.3. Proof that tangent at the mean is optimal 

Note that tlie bound obtained from taking the tangent at ja is optimal for tlie conditions we are assuming on 
f. To sec this, suppose we use the tangent at some other point t, i.e. the line y = fl[t) + (x-t)f(t). llien the 
mean on tliis bound line is 

E[f(t) + (x,-t)r(t)] = f(t) + (/x-t)f(t) 

Now we want to find the maximum of this as t varies, so we take the derivative with respect to t and set it 
equal to zero: 

f(t)-r(t) + (ju-t)r(t) = o = (/i-or(t) 

Hut since we assumed that f had a constant-sign second derivative in the interval of interest, tlie only way this 
can be zero is if p = t. Hence the only extreme value for the bound will be when we take a tangent at /x -- a 
minimum for downwards-curving functions, and a maximum for upwards-curving. 

3.4. Miscellaneous comments 

In the case of a negative second derivative, tlie tangent bound is an upper bound, and the secant bound a 
lower bound; otherwise, the reverse. Note the two bounds arc related, because they can be rewritten as 
fl[(l-a)m -f- aM)and 

(1-G!)f((m) + «f(M), where a — (;i-m)/(M-m) 
so tliey represent interchanging of a weighting and functional application. 



Here is a table of the linear bounds for our six common transformations: 
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Function 

natural log 
antilog 
square root 
square 
cube 

reciprocal 

where a = [p.-m]/[M-m] 



Upper mean bound 
ln(p) 

(l-a)c'" + 

VjX 

(l-a)m^ + aM^ 
(l-a)m^ + aM^ 
a/m + (l-a)/M 



Lower mean bound 

(l-a)ln(m) -I- aln(M) 
ef‘ 

(l-a)-/m + 0 -/M 
1/ju 



3.5. Accuracy of linear mean bounds 

To illustrate efTcctivcncss of the bounds, we tabulate the bounds for m = 10, M = 100, f=ln, and for 
/a = 19,28,37,46,55,64,73,82, and 91. The "bounds range fraction" is tlic ratio of die distance between the 
bounds to the total range of the function on the values, die difference between f(M) and fl[in); it indicates the 
quality of the estimate. 



mean (ju) 


upper bound 


lower bound 


bounds range fraction 


19 


2.944 


2.533 


.179 


28 


3.332 


2.763 


.247 


37 


3.611 


2.993 


.268 


46 


3.829 


3.224 


.263 


55 


4.007 


3.454 


.240 


64 


4.159 


3.684 


.206 


73 


4.290 


3.914 


.163 


82 


4.407 


4.145 


.114 


91 


4.511 


4.374 


.059 



It is typical that the estimates arc best for extreme ju, and die error is w'orst for a particular value inside the 
range. We can calculate this value. Assume f has negative second derivative (the other case is analogous). 
Then we want to find die maximum of the function representing the difference of die tangent and. secant 
bounds, or 

g(x) = f(jx) - (l-a)0[m) - af{M), where o = (/a-ni)/(M-m) 

We find diis by setting to zero the derivative with respect to ]ii, in other words 

dg(p)/dx = 0 = df](u)/dx + f(m)/(M-m)- f(M)/(M-m) 
fV) = (fi(M)-am))/(M-m) 

Or in other words, the maximum error occurs for any function f (that satisfies our conditions) for a mean at 
die point wlicrc die Uingoiit to f is parallel u> the secant dirotigli die endpoints. I his makes sense because diis 
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is tlic point at which f][x) stops "tiirning away" from the secant and begins turning back towards it. Note by 
Rollc’s Theorem there is always one siicli point where the lines are parallel, and the constant sign of die 
second derivative ensures that there is never more tlian one such point. 



For specific f we can tabulate the point of maximum error from this fomiula, as a function of m and M. 



I'unction 



Worst ju 



natural log (In x) 
antilog (c’^) 
square root 
square 
cube 

reciprocal 



(M-m)/ln(M/m) 
ln[(e^^-e"’)/(M-m)j 
(M-m)V4['/M- ■/m|^ 
(M + m)/2 

-/[(m^+ mM + M^)/3] 

AmM) 



The maximum error may then be obtained as |f(]Li) + f(M) - ((/Li^^|.^^-m)(f(M)-f(m))/(M-m)]|. 



3.6. Bounds on the standard deviation, given mean 

A simple application of the linear bounds on the mean of transformed values is to bounding die standard 
deviation of a set of values given only die maximum (M), minimum (ni), and mean (/x). The variance is 
computed: 

X(x-/i)Vn = Ex^/n - 

But since square is a continuous function with a constant-sign second derivative, we can bound the second 
summation, and hence the bounds on the variance arc: 
lower bound: ju,^ - = 0 

upper bond: m^ + (jx-m)(M^-mV(M-ni) - = jxM -b jxin - mM - = (jx-m)(M-jx) 

And so the bounds on the standard deviation arc: 

lower bound: 0 

upper bond: V'((/x-m)(M-/x)] 

We will use diis result frequently. 



4. Linear bounds on the standard deviation 

There arc two methods we can use to bound the standard deviation of a set of transfonued values. I''irst, we 
can use the two bounds lines used previously, bound the sum of the squares, and subtract out tlic effect of the 
mean (i.c. use the formula Ex /n - [Ex/n] ). Second, we can construct two new lines passing through f(x) at 
tlic mean of the transformed values. 
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4.1. Sum-of-squares bounds 

Bound line y = ax + b has second moment (sum of squares) equal to 

F[(ax + b)^] = H[aV + 2abx + b^] = a^(a^ + ju.^) + 2abju + b^ = (a/x + b)^ + aV 
For our two bounds lines; 

tangent; a = f'(ju), b = [f(/a) - n * 

secant; a = (f(M)-f(m))/(M-m), b = [f(m) - m * [(f(M)-f(m))/(M-m)]] 
hence tlic tangent bound on the sum of die squares is 

(a^+;u2)[f(^)2j ^ 2iu[r(|a)][f(ju) - * r(^)] + [f(^i) - fx * r(M)]^ 

= + [f(M)l^ 

and the secant bound is 

p\a^ + n'^) + 2ja/?[f(m) - m/3] + [f(m) - mpf 
where /8 = [f(M)-f(m)]/lM-m] 

To find bounds on the variance, then, we subtract tlie larger of these two bounds from the square of the 
lower bound on the mean to get die upper bound; and subtract the smaller of these two bounds from the 
square of the upper bound on the mean to get die lower bound. The standard deviation then has upper 
bound the square root of the variance upper bound, and lower bound the square root of the varianee lower 
bound. 

To return to our previous example, suppose f=ln, m= 10, M = 100, jn = 23, and also suppose a = 10. Then 

the bounds on the sum of squares are 

tangent; 629 * (1/23)^ + 2 * 23 * (1/23) * [ln(23) - 23 * (1/23)] 

+ [ln(23) - 23 * (1/23)|2 = 1.19 + 4.28 + 4.57 = 10.04 

seeant; p = ln(100/10)/(100-10) = .02558; hence bound is 

(.02558)^ * 629 + 2 * 23 * .02558 * [In(lO) - 10 * .02558] + [In(lO) - 10 * .02558]^ 

= .412 + 2.409 + 4.189 = 7.010 

Now since the bounds on the mean are 2.635 and 3.135 from our analysis in section 3, the bounds on the 
square of the nican are 6.95 and 9.82, Hence bounds on the variance are 10.04-6.95 = 3.09 and 
7.01-9.82 = -2.81, and bounds on the standard deviation arc thus V3.09 = 1.76 and 0. 

4.2. Special standard-deviation bounds lines 

To bound the standard deviation of tlie transformed values we ean use different bound lines than for tlic 
mean. First, let us assume we know an exact value for the mean of the transformed data values - call it <p. 
Distance from <p to each transformed data value is what needs to be linearly bounded, so we use secants 
through f(x) at <p (see llgure 4-1). We assume f(x) is monotonic, and hence f '(<p) is unique, so let r'(<p)=»' 
(i.c., <p = f(r')). So to get an upper bound on the sLind.ird deviation of the transformed values, we use a line 
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below f{x) for x<if, and above for x>*'; and to get a lower bound, a line above f(x) lor x<*', and below for x>i/. 
(Vice versa for a inonotonically decreasing fix).) Now since we assume f(x) has a constant^sign second 
derivative in tlie interv'al, die line segment from mio p must lie constantly to one side of fix), and similarly the , 
line segment from if to M. Hence choose the extensions of those two line segments into lines as our bounds 
lines, lliese lines have equations 

y = (x-*')(f(i/)-f(m))/(i^-m) + tl*') 

' y (x-^)(f(M)-f(i/))/(M-*/) -h fl^) 

Now: 

aj = H[(y-a*'))^l 

And if y = m(x-t') + this is; 

E[[m(x-*<) + fCt-) - = F,[m\x-*/)^] 

Hence using the formula for the variance, the second moment about die mean, die variance of the 

transformed values is bounded by 

[a^ + and 

and 

[a2 + (^-^)2][(r(M)-i-)/(M-i-)l2 

Hence the standard deviation is bounded by 

V[a^+{t>-ii9] [(f(i')-f(M))/(»<-M)l and 
V[a2+(i--ja)2l[(n;i')-f(in))/(»'-m)l 

They arc upper and lower bounds respectively for curves with positive second derivative, and vice versa for 
negative second derivative. Hence the bounds arc just an "adjusted” standard deviation of die original values 
times the slopes of die lines from the mean of the d'ansfomicd values to the minimum and maximum on the 
interval. 

Note since 

af{i>) is between a[(f(/i)-f(m))/(/x-m)l 

and a[(f(M)-f(/x))/(M-/i)], for f"(x) constant-sign 

a rough approxUnation of the standard deviation of die transformed values (as opposed to boniicf) may always 
be obtained from CTf(i'), and this will be increasingly good an approximation as a gets smaller. Also note that 
for a narrow range of mean bounds, the difference between our standard deviation bounds is a rough 
approximation of the second derivative of f at 

a[(f(M)-f(i'))/(M-*')| - a[(IT*')-f(m))/(i/-m)| s; 2crr'(t-) 

So the width of the bounds varies proportionately v^ith the magnitude of die second derivative at the mean of 
die transformed values. 









Figure 4'1: Linear bounds on the standard deviation of transformed values 
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4.3. Handling inexact transform means 

But this assumes we know v, the mean of ihc transformed values, exactly Wc do for the square function, 
for instance. Otherwise tlicrc is an adjustment wc can make, l.ct the upper and lower bounds on the value v 
which maps to the transform mean be and dTicn the bounds on the variance of the transformed values 
arc 

inax^^<^<^J(r2+(/x-,.)2][(n:^)-f(M))/(,.-M)]2] 
and min[min^^^^^^^[CT^+(/i-j/)^][(tIt')-f(m))/(«^-m)]^ 

Since niax(max(g(x)*s(x)),niax(b(x)*s(x))) = max(max(g(x)*s(x),h(x)*s(x))) = max(max(g(x),h(x))*s(x)), wc 
can simplify: 

max^i^^^^^Jmax[(f(«/)-f(M))/(«/-M)f,(f(«^)-f(m))/(</-in)f']*[CT^+(/i-«^)^]] 
andmin^^^^^^^[min[({i[»/) f(M))/(*/-M)r,(f(i/)-f(m))/(i/-m)]Y[CT^+(/i-")^]] 

First, suppose f(x) is monotonically increasing (like all of our six important functions except 1/x). If the 
second derivative is positive, tlien die inner max is the first subexpression in tiie first bound above, and the 
inner min is the second subexpression in die second bound. We can then rewrite die formulae: 

and inin^^^^^^^[(fl[/')-f(m))/(r-m)J^*I(T^+(/Li-(/y] 

Note that these represent the product of two functions which arc both monotonically increasing with respect 
to u. For a monotonically increasing f(x), /r is a lower bound on v. dhe product of two monotonically 
increasing fiinctions is a monotonically increasing fimction. The max of a monotonically increasing fiinction 
is die value at the rightmost point, and the min is at the leftmost point. So the revised bounds on the variance 
of the transformed values, given f(x) increasing and with positive second derivative, are 
upper: 

lower: [( H[ </ , )- f( m))/(r , -m)]^*[cr^ + (fi-*' , f] 

Similarly if f(x) has a negative second derivative (again, assuming the first derivative is positive), wc can show 
by analogous reasoning that the bounds are: 

upper: )-f(m))/(/Y-m)f*[a^ )^] 

lower: 

Using our example of f=ln, m = 10, iVl - 100, /x -23, a - 10, wc use tlic previously found linear bounds on 
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the mean of the logarithms of /'jj = e^ ^^^ = 23 and = 13.9. Hence bounds on the standard deviation 

of the logarithms are: 

y[102+9.1^][(2.635-ln(10))/(13.9-10)] = U 56 
4l0^ + 0^][(ln(100)-3.135)/(100-23)] = , j<v.. 

both being better than the siim-of-squarcs bounds in section 4.1. 



Unfortunately, revised formulae for monotonically decreasing functions arc not as easy. The partial 
derivative of the bounds expressions must be set to zero and inverted. Consider the case for the upper bound 
for a curve with a negative second derivative (like 1/x): 

0 = 9/a</[[(fl:</)-i((M))/(*/-M)P*[a^+(/i-/^ )^]] 

0 = 2[(f(r)-f(M))/(/'-M)l * [(r(i/)(/^-M) - (f(*')-f(M)) / (i/-M) 2] * [(T^ + (/t-/')^]] 

+ [(K/.)-f(M))/(/.-M)]^*-2(/ii-*;) 

- (f(/^)-f(M)) / (r-M)2] * = [(f(*')-n[M))/(^-M)] * (ix;) 

which is then solved for i>, and the value substituted in the function differentiated above to obtain tlic bound. 
Analogously, the other bound is found by solving 
[(f(,.)(r-m) - (f(/^)-f(m)) / (;.-m)2] * 



4.4. Evaluating standard-deviation bounds 

The sum-of-squares bounds of section 4.1 arc hard to evaluate, but we can examine tlic slope-based bounds 

of the last section, provided we assume i> is known exactly. We are interested in knowing the largest possible 

difference between tlic upper and lower bounds for an exact i>, or the maximum of 

D(i') = (T 2 [[(f(M)-f(/^))/(M-/^)] - [(f(/^)-f(m))/(/.-m)]] 
where 0 ^ = 0 ^ + 

For four of our functions -- x^ x^ 1/x, and Vx - this is straightforward to find: 

• x^: D(</) = a 2 [(<' + M)-(// + m)] = a 2 (M-m), so D is constant. 

« x^: D(r) = -t- I'M -f M^) - -t- I'm -t- m^)] = a 2 [i'(M-m) -I- (M^'-m^)]. This has 

maximum at <' = M of cr2(M'niX2M-m). 

o 1/x: D(i') = a 2 [l/i'm - 1/I'M] = 02 ( 1/111 - l/M)/i'. This has a maximum at i' = m of 
a2(M-m)/m^M. 

o Vx: 0 ( 1 /) = 02 [(]/(Vi> + x/M)) - + 7ni))] = + (7M-Vm)Vi' -h 

\/(mM)). This has a maximum at r = m of 02 ( 1 / * 1/ VM). 

P’or transcendental functions like ln(x) and o’* we can attack tlic problem with an infinite scries obtained 
from the Taylor series expansion of the function about v\ when the curve is relatively flat in the interval of 
interest, tlic approximation will be good. 
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D(i>) = a 2 [((f(»/)-f(M)))/(*^-M)-((f(r)-f(m))/(^-m))] 

l-ct us expand tlic first quotient in tlie brackets into a series. 

(fl*-)-f(M))/(*;-M) = (f(r)-(f(*') + («’-M)f(*/) + (r-Mfr(^)/2\ + ...]]/(»/-M) 

= -[f(*/) + (*/-M)fV)/2! + (i/-M)V"(</)/3! + ... ] 

Hence 

'D(»') = aE.^^^^^[[(t^-tn)’-'f(*^)/i!]-((i/-My-l f(i/)/i!]] 

We need to take tlie derivative with respect to i> of this in order to see if it lias a maximum in the interval. The 
condition for the maximum is thus: 

To approximate tliis we can take the first few terms: 

0 = (M-m)f»/2! + (2*/(M-m)-(M + m)(M-m))f»/3! 

0 = (M-m)[f»/2 + (2»/-m-M)f'V)/6] 

As an example, consider fl[x) = e’^. Then: 

0 = (M-m)[eV2 + (2i/-m-M)e"/6] = (M-m)e"(l/2 + *//3 - m/6 - M/6) 

which can be solved iteratively for 

5. Quadratic bounds on means: Taylor-series methods 

5.1 . The problem 

A straight line is not a very good approximation to a function with a strong curvature. An obvious next 
step to improve our estimates of the mean is to constnict quadratic bounds lines of the form y = ax^-fbx-l-c 
and compute the mean along those: 

E[ax^-bbx-f c] = a{a^+n^) + bfi -I- c 

However, finding quadratic bounds curves is not as easy as it might seem. We generally cannot just use tlie 
Taylor series about Some point of the curve, as with the estimates (not bounds) of [9], because while such 
approximations may stay close to the curve of die actual flinction on some range, they may be above and 
below it at different places, for insuince, take the 3-term Taylor series for f(x) = ln(x) about x= 1, which is 
0 -h (x-l)*(l/l) -1- (x-l)^*(-l/lV2 = -.5x^ + 2x - 1.5 

At x = 2 diis is .5, below the logaridim curve value ln(2) = .69, but at x = .5 this is -.625, above the logarithm 
curve value ln(.5) = -.69. Hence die approximation curve crosses ln(x), and cannot be used as a bound on die 
values of die latter. 
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5.2. Quadratic bounding by vertical shifting 

There is a way we can use arbitrary polynomial approximations to get bounds; we can shift the 
approximation curve upwards or downwards until it no longer crosses the target curve in the interval. To put 
this formally for the Taylor series, we want to bound f[x) on die interval m to M by the function 
h(x) = f(t) + (x-t)f(t) + .5r'(t)(x-t)^ + K 

where t is some arbitrary point in the interval, and K is some constant. If we choose t = p (for quadratic 

bounds a convenient, but not necessarily best-bound point), then the mean of the approximation function is 

F-[h(x)] = f(p) -b (ju.-jti)r(t) -f- .5[o'^ jti,^ - 2jit^ -b ju.^] f* (t) + K 
= fill) + .5a^r'(t) -b K 

If wc do not choose ju the formula is sligluly more complicated: 

f(t) + (/i-t)f(t) -b .5(a^-b(p-t)^)r'(t) -b K 

Note for the particular function f(x) = x^ the Taylor scries has only three terms, and hence an exact formula 
for the mean of the square of a set of data values is 
+ .5(J^(2) = jLA^ + 

n^e lower and upper bounds are then found from substituting and which are respectively the 
maximum and minimum values in the interval of study of the error of the approximation e(x), defined as 
e(x) = Kx)-f(t)-(x-t))f(t)-.5(x-t)V'(t) 

Since tlie interval is finite, we cannot just find die zeros of the derivative of e(x). Zeros have to lie within the 

data-value interval, and they must be compared to two other points, the function values at the maximum and 

minimum of the range. In other vzords; 

is max[c(m),c(M), e(Zj^), cfz.^), ...] 

K|^ is min{c(m),c(M), e(Zj), c(z. 2 ), ...] 

where die z.j are all zeros ofc’(x) within die interval. To find the zeros: 

ae/3x = f(x) - r(t) - (x-t)T'(t) = 0 
ir(x)-r(t)]/(x-t) = f'(t) 

We always know one solution of the above equation, x — t, because 
m ■ f(t)i = (t-t) r'(t) = 0 

Hut dierc arc no other solutions for functions with constant-sign derivatives, imjilying no other local maxima 
or minima for a Taylor-scrics approximation. To sec diis, note the equation says the slope of f'(x) from t to 
some other point must be equal to the derivative of T (x) at t. But this cannot occur if the second derivative of 
T(x) (i.e., T''(x)) is constant in sign, because then each value of the first derivative (i.e., T'(x)) can occur at most 
once. 

Hence we can write die Taylor-scrics quadratic bound in general as (noting c(jii) = 0); 
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upper bound; f(t) + (/i-t)r(t) + .5(ff^+(/i-t)^)r(t) + max(c(m),e(M),0) 
lower bound; f(t) + (/t-t)l'(t) + .5(cr^+(/x-t)^)f"(t) + min(e(m),e(M),0) 

For particular functions f we may be able to rule out some possibilities for tlie min and max. F'or instance, 

for flix) = x^ e(x) is just tlie fourth Taylor-series term, (x-t)^*6/6, so e(M) > 0 and c(m) < 0, and bounds arc 

upper bound: t^ + (p.-t)*3t^ + ,5(0^+(/i-t)^)*6t + (M-t)^ = 3t^(M-/i) + 3t[a^+/i^-M^] + 

'lower bound: t^ + (/x-t)*3t^ + .5(a^-f-(jii-t)^)*6t + (m-t)^ = 3t^(m-/i) -r 3t[a^'+/x^-m^j -I- m^ 

Similarly, c(m)<0 from analyzing die Taylor series for logarithm and square root; 0<c(m) for reciprocal; and 

0<c(M) for antilog. 

5.3. An example 

To illustrate, use our previous example off=ln, m = 10, M = 100, t=/x = 23, and a = 10. Take the Taylor 

series about fi. From the preceding we know that die only possible extremes occur at m, M, and p., so note: 

e(x) = ln(x) - [ln(23) + (x-23)/23 - .5(x-23)V23^] 
c(m) = In(lO) - [3.14 - .56 - .16] = 2.30 - 2.42 = -.12 = 
c(p) ln(23) - ln(23) = 0 

c(M) = In(lOO) - [3.14 -t- 3.35 - 5.6] = 4.6 - 0.9 = 3.7 = K^j 
Which arc die bounds offsets we have to add to the estimate of the mean of 
ln(23) - .5 10^23^ = 3.06 

So we estimate die mean of die logaridims is 3.06, with an upper bound of 3.06 + max(-. 12,0,3.7) •— 6.76, and 
a lower bound of 3.06 + min(-. 12,0,3.7) = 2.94. The upper bound is much worse than the linear upper 
bound (3.135), but die lower bound is better than die linear lower bound (2.635). 



5.4. Choosing the optimal point for the Taylor series 

The question arises as to the best value of t for getting an upper or lower bound. Analysis requires careful 
preconditions, but we can often do something like diis. Suppose that c(M) is the maximum value of c(x) on 
the interval of study. The estimate of the transformed mean from hiking the Taylor scries about t is 
f(t) + (p-t)r(t) + .5[(r24-(p-t)2]r(t) 

= m + (p-t)f(t) + .5[(r4(p-t)V'(0] + [f(M)- f(t)-(M-0r(t)- .5(M-t)¥(t)j 
= RM) + (p-M)r(t) + .5[a^+p^-M^-2pt-t-2Mt)]r(t)j 

We want to minimize tliis maximum error with respect to t, i.e. we want: 

0 = 9/3t [f(M) + (/1-M)f(^t) + .5[a2+p2-M2-2pt + 2Mt)]r'(t)l 
0 = (p-M)f'(t) + .5[(7^ + p^-M2-2pt+2Mt)]r"(t)] -f (M-p)f"(t) 

0 = .5[a^+p^-M^-2pt-t-2Mt)]f''(t) 



For a function with derivatives constant in sign, this can only be zero if the expression in brackets is zero: 
0 cr^ + /i^-M^-2/tt-h2Mt 
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t = [a^+ 

t = [/i + M - 5^^] / 2, where 5j^, = 

Hence substituting back in the expression for the bound, the second derivative term must disappear, and we 
get 

f(M) + (/i-M)f((/i + M-5,^)/2) 
which is an upper bound provided c(M)>0 and c(M)>e(m). 

V 

By similar analysis we can show that 

t = [jn + m + / 2, where 6^ = a^/(/i-m) 

is the best t for obtaining the other bound on tlic c(x) on the interval of interest, leading to a lower bound of 
f(m) + (/i-m)f'((/x+m + 5jj^)/2) 

provided e(m)<0 and e(m)<e(M)- For a = 0 the upper and lower bounds occur at t=(jad-M)/2 and 
t=(ju+m)/2 respectively; and for o tlie maximum,l|[(M-/iXp.-m)], tlicsc arc both (M-m)/2. 

So for the logarithm function (where e(m)<0 necessarily) p, = 23, m = 10, and M= 100, and tliis gives for a 
lower bound for t = (23 + 10 + .5*lG0/(23-10))/2 = 20.3, and the bound is 
f(m) + (ja-m)r(20.3) = In(lO) + 13/20.3 = 2.30 + .640 = 2.94 
which is negligibly better than for the series about jx, but may represent an improvement in other cases. In 
general, the Taylor series approach works well for narrow intervals of interest or intervals where f(x) is rather 
flat. We can, however, use order statistics to improve faylor-scrics bounds; sec section 10. 

6. Quadratic bounds on means from Lagrange interpolation 

Taylor scries approximations deteriorate on tlic edges of an approximation interval. VVe are more 

concerned witli signed maximum deviation of the approximation from tlie function (a concept distinct from 

the Lqq approximation, which minimizes the absolute value of deviations), and a better quadratic for our 

purposes comes from Lagrange interpolation method using the Chebyshev interpolation points. For a 

quadratic we need three points to fit the curve through, giving: 

h(x) = f(p)(x-q)(x-r)/(p-q)(p-r) + f(q)(x-p)(x-r)/(q-p)(q-r) + f(r)(x-p)(x-q)/(r-p)(r-q) 

h(x) = (8/3(M-m)^)[f(p)(x-q)(x-r) - 2f(q)(x-p)(x-r) + f(r)(x-p)(x-q)] 

where p = m + (.5-73/4)(M-m), q = (MTm)/2,and r = m + (.5 + V'3/4)(M-m) 

Using our example off=ln, m = 10, M = 100, /.i = 23, and cj — 10, we have: 

p = 16.029, q = 55.0, r=93.971; ln(p) = 2.7744, ln(q) = 4.0073, ln(r) = 4.5430 
h(x) = -.0002295x^ -I- .04794x + 2.0648 

Hence an estimate of the mean of the logarithms for this example is 

-.0002295(10^ + 23^) + .04794(23) + 2.0648 = -.1444 + 1.1026 + 2.0648 = 3.0230 
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This is an estimate, not a bound. Just as with Taylor-scries polynomials, we can get bounds from this from 

knowing the extrema (maxima and minima) of the error curve on die interval of interest. For Chebyshev (as 

opposed to Taylor-serics) approximations there arc two places in the interval where c(x) = 0, and hence one 

local maximum and one local minimum. We can find these by solving the error curve derivative explicitly;, 

for logaridim and cube this is a quadratic equation, for square root and reciprocal a cubic, and for exponential 

a U*ansccndcntal equation. For example, for our ln(x) example: 

d/dx[lii(x) - (-.0002295x^ + ,04794x + 2.0648)] = 1/x + ,000459x - .04794 = 0 
hence .000459x^ - .04794x + 1 =0 

and X = [.4794 ± V(.04794^-.001836)j / .000918 = 28.80 and 75.64 
So the extrema of c(x) on the interval can occur at only four points: m = 10, M = 100, 28.80, and 75.64. 
Computing c(x) tliere: 

e(10) = -.2187, c(lOO) = .04137, c(28.80) = .10526, e(75.64) = -.05193 

And hence the Lagrangc-Chcbyshcv quadratic bounds on the mean of the transformed values arc: 

upper bound: 3.0230 + max(-.2187, .04137, .10526, -.05193) = 3.1283 
lower bound: 3.0230 + min(-.2187, .04137, .10526, -.05193) = 2.8043 

which arc better than the linear bounds of 3.135 and 2.635 (and hence the Taylor scries bounds too). 

7. Quadratic bounds on means: one-sided methods 

There arc quadratic methods tliat avoid having to find tlie extrema of die error function in computing an 
approximation, by constructing approximation curves entirely above or entirely below die target function in 
(Jic inteiwal We can do this if we can position tlic points of intersection of the approximation curve ax T bx 
-f c with f(x) to lie either (a) outside the interval, or (b) tangent at some point. Among our six demonstration 
fiinctions, reciprocal and cube lead to cubic polynomial equations. 

7.1 . Intersection and tangent positioning: reciprocal 

Consider reciprocal first. The error curve is 
c(x) — 1/x - ax^ “ bx - c 

and it can have at most tlirce zeros which arc the solutions to 
0 = ax^ -f- bx^ -b ex - 1 

To keep tlic approximation curve ’’close”, we can put a point of tangcncy at some t inside the interval - i.c., a 

double zero at t -- and another zero at M. We can write this function as e(x) (x/t - l)^(x/M - 1), which 

approaches -oo for small x, + co for large x, reaches a local maximum at x = t, a local minimum at some larger 

X value, and then crosses zero permanently at x = M. llicn we want 

(x/t - l)(x/t - l)(x/M - 1) = ax^ + bx^ H- ex - 1 

xVt^M - x^(2/tM + 1/L- ) + x(2/t + 1/M) - 1 = ax^ + bx^ + ex - 1 

a = 1/t^M, b = -(2/tM .+ \/t\ c = 2/t + 1/M 
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So the quadratic lower bound on the mean is 

-(2/tM + 1/tV + 2/t + 1/M 

We are interested in the best lower bound possible, i.e. the largest. We can find this by setting to zero the 
partial derivative of the preceding with respect to t: 

0 = -2(a^+/xVt^M + (2/t^M + 2/t^ ' 2/t^ 

0 = -(a^ + jaVM + (t/M + 1^ - t 
't = [ju-(ff^ + /iVM]/(l-/i/M) 

= /i - CT V(M-/i) = ji - where 

So for a = 0 this is /x; for a a maximum, namely V[(M-/x)(]u-m)] (see section 3.6), this is m. \Vc saw this 8^ 
term before in a different kind of quadratic approximation in section 5.4. 

Substituting this t in the bound fonmila, we get a quadratic lower bound of 

[(a^+ju^-M/x) + 2(/x-5^^)(M-/x) + (/x-5 / M(/x-5 

= 1/M + + 2(M/x-/x^-a^)] / M[(/x - (tVIM'/x))^] 

= 1/M + (M/x - - /x^l / M[(M/x - - /x^) / (M - ju]^ 

= 1/M + [(M - /x)^ / M(M/x - - /x^)] 

= (1/M) [ [M/x - <T^ - + M^ - 2M/X + /x^] / (M/x - - /x^)] 

= (l/M) [M^ - M/x - a^] / [M/x - /x^ - ct^| 

= (l/M)(M-5^.^,)/(/x-5_^) 

Note that when a = 0 tins is equal to 1/M * M / /x = 1/jii, tlic linear bound. Since /x<M, a nonzero a 
will cause the denominator of tlic fraction to decrease proportionately more than the denominator, and hence 
give a lower bound greater (better) than the linear lower bound. The maximum value of o is V[(M-/x)(jLt-m)], 
whereupon 5^ = /x-m, and tlie lower bound is 1/M * [M - n + m] / m = 1/m 4- 1/M - /x/mM, exactly the 
upper linear bound for reciprocal (sec section 3.2). 

Again, let’s use our standard example of m=10, M=100, /i = 23, a = 10, diis time for the reciprocal 
function. Then 

8^ = 10V(lOO-23) = 1.299 
And a lower bound on die mean of die reciprocals is 
1/100 * (100 - 1.299) / (23 - 1.29^)) = .04548 
This is better diaii the linear lower bound, calculated as l//x = .0435. 

We can get an upper quadratic bound by only minor modifications; just create a bounding curve diat 
crosses 1/x at m instead of M, and is tangent at t in the interval. We just substitute m for M in the preceding 
formulae, giving 

an upper bound of (cr^ + /i^)/t^ni - (2/tm + l/t")/x 4- 2/t 4- 1/m 
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Uikcn at t = ju. + o /(ju-m) = ^ + 8^ 

which can be writlcn as (1/m) (in + 5^) / (ja + 5^^) where 5^ = <jV(|a-m) 

So for our example data, t = 23 + 10V(23-10) = 30.69, and die upper bound is 1/10 - 13/10*30.69 = .0576. 
This is significantly better than the linear upper bound of (77/90)*.l + (13/90)*.01 = .0871. Hence by using 
a quadratic rather than linear bound we have narrowed the range of the answer by a factor of 
(.0576-.0455)/(.0871-.0435) = .278. 

7.2. Evaluation of the quadratic reciprocal bounds 

We can obtain useful approximations of the quadratic bounds by replacing the quotient with the first few 
terms of its binomial expansion, as here for die lower bound: 

(M ■ 5jv|) ~ "^1 ^iv / ^ y 17 7 

hence 1/M (M - 8^) (ju - 8^)'^ l/fi + (l//i^ - l/M/i)8j^ + (1/ju^ - l/M/i08^, 

= l/fi + 8^{l/fi-l/M)/ii + 8^1//i-l/M)/ju^ 

= l/[i + + aV(M-/i)M/i^ 

Hence the difference between the quadratic bounds can be approximated by 

(l/m - l/M)aVja^ + (l/m(m-/r) - l/M(M-/t))a%^ 

= [(M-m)a [1/mM + (m + M-jii)aVjLimM(m-jLi)(M 7 i)] 

As suggested in the previous section, the quadratic bounds are always better than the linear bounds except at 

the two extreme cases of a. We can find the [i and a for which they arc least accurate. Set the partial 

derivative of die difference between the quadratic bounds to 0: 

0 = a/3/i ((1/m - l/M)aV/i2 + (l/m(m-jn) - l/M(M-iti))(j%^] 

0 = - 2 (l/m - l/M)aV/i^ + [l/m(m-/ii)^ - l/M(M-/n)V‘V 

+ -3(l/m(m-^)- l/M(M-/i)]CT'*//i‘’ 

2(l/m - 1/M) = [l/m(m-/i)^ - l/M(M-/i)V^ ■ 3(l/m(m-/i) - l/M(M-jn)]CTVja 
which can be solved iteratively. 

7.3. Intersection and tangent positioning: cube 

We can do something similar for the cube function: 
c(x) = x^ - ax^ - bx - c 

which is a third-degree polynomial just like the one for reciprocal. So we can position one intersection point 
and one Langcncy point. This time we can write c(x) as 
e(x) = (x-t)^(x-M) = x^ - ax^ - bx - c 

hence 

a = 2t-t-M, b = -(t^-t-2lM),c = l^M 
SO an upper bound on the mean is 
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(2t+M)(o-^+/n^) - (t^ + 2lM)/i. + t^M 
and this is a minimum when we choose a t such tliat 
2(a^ + H^) - (2t+2M)/i + 2tM = 0 

t = IX- ffV(M-/i) = /X - 5^ 

Substituting this in tlie equation for the bound: 

' + 2(ix-S^^)(a^+ix^-MiJi) + 

= - [1^ - + 2/ut^ + 2/x^ - 

2n^M - 25 j ^^ ct ^ - 2S^y + 28^^nM + a^M + 

= + (2/i + ‘M-26j^^)a^ 

= + ctV(M-/i) + (2/i + M - 2aV(M-ju))(j2 

= - a V(M-/i) + (2ju + M)a^ 

= + (2fj, + M * 



Similarly, a lower bound is 

(2t+m)(a^+/i^) - (t^+2tm)/i + t^m 
and this is a maximum when we choose a t such that 
t = ja + aVlja-m) = fi + 
leading to a lower bound of 
fi^ + (2fi + m + 

Note the quadratic lower bound is always greater than the linear lower bound, The difference between 
the upper and lower bounds is 
[M - m - 5,^^ - 

which provides a useful criterion for tlie effectiveness of these bounds. Note this is always nonnegative since 

M - m - [5j^| + = M - m -CT^(M-m)/(M-/x)(/j,-m) 

= (M-m)[l - CTV(M-/i)(p-m)] 

Tlie largest possible value of is (M-/iX/i-m), so tlie quantity in brackets is always nonncgativc. 



8. Optimal quadratic bounds 

llic problem of finding the best quadratic approximation for our bounding purposes may be viewed as an 

optimization problem in two variables. Since the quadratic curve ax + bx + c leads to a bound of 

upper bound; a(a +/i ) + b/.i + c + max^^^^j^,|[f(x)-ax -bx-c] 
lower bound: a(a^ + /j.^) + b/x + c + min^^^^^[f(x)-ax -bx-cj 

and the constant c can be moved out of tlie maximum and minimum, we can write: 

A A A 

upper bound: a(a +/i ) + b/x + max^^^^^^j^^[f(x)-ax -bx] 
lower bound: a(a^-l-/i^) -I- bjii -i- min^^^_J^^.,^^[f(x)-ax^-bx] 

So we have two optimization problems for real a and b: to find the values that minimize tlie upper bound. 
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and the values Uic inaxiinizc the lower bound. We have constructed a program that does tliis by estimating 
die gradient from exploratory steps, finding die zeros of the error function by the quadratic formula for 
logarithm and cube, and by iterative bisection for antilog, square root, and reciprocal. Comparison with the 
other obtained bounds is presented later in diis paper. Unfortunately, die extrema appear to be "broad", and 
convergence is slow, so the other methods discussed in this paper seem clearly desirable in most cases. While 
these other methods cannot usually get the tightest bounds, the difference is usually not much. 

A strong local maximum found by die optimization process is guaranteed to be the global maximum over 
all quadratic curves, because the function being optimized is convex. To see this, note for die upper bound 
for inshince 

+ + {0\+{^-e)h^)n 

+ max^^^^^[f(xH^a^+(l-6')a2)x^-((?b^-)-(l-(?)b2)x] 

< a^(a^ + n^) + b^/i + max^^^^^[f(x)-ajx^-b^x] 

+ a^yo^+ii'^) + b^/i + inaXj^^^ 

since max(f(x)+g(x)) < max(f(x)) + max(g(x)). 

For our standard example, we found the optimal quadratic bounds to be 3.00 and 3.10. 

9. Improving accuracy with outliers and statistics on subsets 

We can tighten bounds if we know additional information about a set of data values. We may know a few 
extreme values on tlie range (outliers), and be able to remove these points from the analysis of die rest of the 
points. This helps a good deal when m and/or M are unusually unrepresentative of tlie distribution (and 
notice how frequently we have used m and M in our fonnulas). With the outliers removed, the remaining 
values can have a narrower range, on which the function can be better matched by a linear or quadratic 
approximation. The transfonued values for die known outliers can then be added to the total mean or total 
variance in a final step. 

But we can generalize dtis. We can improve accuracy of bounds any time we know means and variances of 
arbitrary subsets of the original data values. We may then estimate statistics on the transfonued values for 
each subset and combine them with the appropriate weighting. 

9.1 . An example 

For instance, from [8], dtere were 6133 merchant ships with United States registi y in 1982, of an average 
gross tonnage of 3120 per ship. Of these, 2941 were fishing vessels, of average tonnage 199.6 gross tons; 548 
were cargo ships, of average tonnage 9790 tons; 361 were tankers, of average tonnage 2670 tons. Hence dtere 
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were 6133 - 2941 - 548 - 361 == 2283 other ships of average tonnage [(6133^3120) - (2941*200) - (548*9740) 
- (361*2670)]/2283 [19,130,000 - 588,00 - 5,340,000 - 965,000] / 2283 = 5320 tons. 

Now suppose we want the mean of the logarithms of tlie tonnage values. Consider the upper bounds on 
each of tlie four disjoint subsets. These are just tlie logarithms of the means, or 5.30, 9,21, 7.88, and 8.57. 
Hence tlie total upper bound is tlie weighted mean of these upper bounds, or 
[(5.30*2941) + (9.21*548) + (7.88*361) + (8.57*2283)j / 6133 = 7.018. This should be compared with the 
upper bound derived from the mean of tlie entire set, ln(3120) = 8.03, so tlie subdivision data gave us a 
significant improvement. 

Unfortunately, we do not know anything about tlie maximum and minimum tonnage of classes of ships, so 
we cannot get a cumulative lower bound. However, we know m = 100 for Uiis table, and M = 200,000 is a 
reasonable figure from knowledge of merchant shipping, so a global lower bound is found by 
a (3120-100)/(200000-100) = .0151 

lower bound is In(lOO) 4- a(ln(200000)-ln(100)) = 4.60 + .0151*7.60 
= 4,60 + .115 = 4.715 

9.2. Proof of desirability of subdivision for linear bounds 

It can be proved that linear bounds on tlie mean are never worsened by using such subset statistics. This 
can be seen graphically in figure 9-1. We consider here tlie case of binary subdivision, and fiirthcr 
subdivisions can be covered by extension. We also consider only functions concave downwards, but tlie other 
case can be handled analogously. 

First consider the lower bound. If tlie ranges of tlie subdivisions are the same as the full set, tlien die two 
lower bounds must lie along the same line, and their weighted average must lie along the line too; hence the 
lower bound of the full set is exactly die v/ciglited average of die two lower bounds. If one or both of the 
subsets has a narrower range of values dian the full set, diis can only increase (improve) die lower bound since 
a secant across a subrange lies fully above a secant across a range containing die subrange. Hence the lower 
bound cannot get any worse in diis subdivision summation of linear lower bounds. 

The upper bound also cannot be any worse. This time range reduction within a subset does not matter 
because the upper bound is constrained to lie along the curve of die function, which is independent of where 
it is sliced. The weighted average of the two subset upper bounds is a point along die line connecting two 
points on die function curve. But since the function is concave oownwards, this point is always below the 
function. But since the upper bound on the full set is constrained to lie on the curve, die subdivision process 
always guarantees a better ii])per bound as long as the two subdivision means are different, and no worse if 
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Figure Iinpiovomcnts in linear bounds from combining statistics on two disjoint sets 
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they arc not different. 

10. Exploiting order statistics as well 

So far we have only assumed knowledge of the maximum, minimum, mean, and (sometimes) standard 
deviation of sets of data values. If we have additional statistics on tlic data \alucs we can do a better job of 
estimating statistics on the transformed values. In this section we discuss using order statistics (c.g. medians 
and percentiles). Order statistics have the nice property that they have one-to-one mappings from the original 
data values to the transformed values under the monotonic transformations we arc assuming. 

10.1 . Using the median 

First, assume we know a median in addition to the maximum, minimum, and mean. We can often get an 
immediate improvement in the bounds on estimates. Let tlie error curve (linear, quadratic, or whatever) be 
e(x). Then the median can be thought to partition the points into two equal-sized subranges (assume die 
number of points to be large enough so that even numbers of points don’t botlicr us). Then an upper bound 
on the mean of die transformed values is the csdmate given by die approximation curve plus one half the 
maximum of die error curve in the range to die left of the median plus one lialf the maximum of the error 
curve in die range to the right of the median. Tlic lower bound on die mean is found substituting 
"minimum" for "maximum" in the above rule, riius knowing the median dccieascs die influence of extrema 
of the error curve. 

10.2. Other order statistics 

We can generalize these ideas to the situation where we know arbitrary order statistics on die original 
distribudon. Denote diesc statistics as r pairs of the form <x^,f>, where fraction f of the items in die 
distribudon are claimed to lie to die left of value x.. Then we can generalize the fonmda of section 5 as 
follows; 

upper bound is <cstimatc from approximation curve> - [f * <x<x 

Xj .j^SXXXj 

lower bound is <estimate from approximation curvc> - 2. [f^<;i<f * max^ [c(x)j] 

' i- 1 ■ ' i 

where c(x) is die error curve a(x)-f(x), x^ is defined as m, with fQ = d. and the x^ is defined as M (with 
corresponding f of 1). Thus die effects of the extreme points of c(x) arc "diluted” tiy their fractional 
coefficients, and the more order statistics are known, die tighter the eventual bounds. 

Under certain circumstances we can simplify the abiivc formulae considerably, if we know even- 
subdivision order statistics (i.c., f = i/r, r the number of order stati.stics), and if the error curve e(x) is 
monotonic, then the maximum and minimum of c(x) in each subinterval between the order statistic ordinatvis 
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X. must lie at the endpoints. So if e(x) is monotonic increasing, the upper bound is [Sj^.^^e(x.)]/r and the 
lower bound is ^]/r; and vice versa if e(x) is monotonic decreasing. Hence the absolute range 

between die upper bound and lower bound is always the same number, |e(x^)-c(xQ)l/r = |e(M)-c(m)|/r. 
(Note that Taylor-series quadratic approximations arc monotonic if e(m)<0<c(M) or c(M)<0<c(iti), conditions 
which occur frequently.) 

10.3. Order statistics and the standard deviation 

Order statistics are also help fill in estimating the standard deviation of the transformed values, especially 
order statistics for the leftmost and rightmost subranges of the interval. Recalling die bounds lines drawn 
through the mean of die transformed values in section 4.2, we had to draw them so they lay entirely above the 
curve to one side of die mean, and entirely above on the other side, and this is a highly conservative 
assumption. Assume i' is known precisely. We could probably get a better bound if we knew how many 
points lay to the left of some x^ and die drew a secant of f(x) from the transform mean to it, radier than from 
the transfonn mean to m; or if we knew how many points lay to die right of some x^_^, and drew secant from 
the transform mean to it instead of M. Sec figure 10-1. 

The esdmate of the standard deviation of die transfonned values obtained from these lines is just their 
slope times the original standard deviation. But to get a bound, we need a correction for the points lying more 
extreme than the new point of intersection. Consider die example of curve concave downwards like 
logarithm, and take the upper bound line from the transform mean to some point to die left; call the point Xj^, 
and let it be an order statistic so that fraction p of die distribution lies to the left of it. Assume die mean of the 
transfonned values is known exactly. Then the correction for a bound corresponds to the situation where all 
die p points are at m, which means a difference in the variance of 
p*l(f(r)-f(m))^- - [(i^-m)*(f(,.)-f(xj))/(r-xpi2] 

where i/ is the number which maps fiinctionally to the mean of die transformed values. Hence the expression 
for the upper bound on the standard deviation is 

[[<j2-K,x-r)2j((f(i.)-f(xp)/(,/-Xj)]2 -f p*(f(«^)-f(m))2- p*[(,.-ni)*(f(r)-f(x^))/(i;-xj)]2]5 

So using such a bounds line can give a better slope, but one pays a penalty of a correction term which 
subtracts from tlie slope improvement. An obvious question is under what conditions use of the order statistic 
helps. It turns out this has a surprising answer when i/ is known exactly. Denote the two slopes as s^ and s^, 

i.e. 

Sm = (f(i')-f(m))/<'-m), s^ = (f(r)-f(xj))/(j/-xp 
we can rewrite our expression for the upper bound as 

\ + p*S|’^*(»/-m)^- pV*(j/-ni)V 
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Figure /<?-!: F.xploiting order statistics for a hetter bounds on the standard 

deviation 
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I'liis will represent an improvement on the linear upper bound + if 

(a^ + (/i-t')V > \a^+(^L■l>)^] * s^+ p*S|^^*(t'-m)^ - p*s^*(j'-m)^ 
or [aV(/i-t')1[s^ - s^l > p*[s^^ - s^]*(j'-m)^ 

So the slope terms eaneel, and use of the order statistie <x^p> is going to be helpful when: 

[o^+(lx-vf] >p* (t'-m)^ 

' orp < [(a^ + (ja-j')^)/(f'-m)^] 

This result is independent of where the order statistie is within the distribution (x^, and depends only on the 
standard deviation and minimum of the original distribution, and tlie mean of tlie transfonued values. The 
eorresponding result for the rightmost order statistic is 
p <[(a2+(^-r.)V(M-«^)2] 
where p is the fraetion of items to the right of x^. ^ 

If we know other order statisties tlian just the leftmost and rightmost (x and x^ ^ we ean get better bounds, 
tlioLigh predieting the improvement is diffieult. For instanee, if we know x^, we ean take a line from to 
and estimate the eontribution to the eorreetion faetor from tlie items between x^^ and x^ differently than the 
contribution of items between m and Xj^. 

10.4. Adjustment of standard deviation for an inexact transform mean 

If we do not know the exaet mean of the transformed values, 9 = ftf), we must adjust these results. Let 

the bounds on tlie transfomi mean be and J'jj as in scetion 4 . 3 . Assume f(x) has a negative second 

derivative. The formula for tlie upper bound is 

[[a2+(^-0^][(f(r')-f(x,))/(/^-x,)]2 + p*(fl;,-)-f(m))2 - pn(/'-m)*(f(r^)-f(xp)/(r^-xp]V 

Since [[o +iix-p) ] is monotonically decreasing with in its range. 7 'he rest of die expression is tlie 

differenee of a term and the difference of two others. The first tenn is monotonieally decreasing with 

inereasing sinee the seeond derivative of die eurve is negative. This represents the second moment of 

items grouped at m on the eurve. As inereases, die possible distance dicsc items eould be off the bound line 

inereases, and their relative weight inereases as f(i^) beeomes relatively larger than f(m). Hence since diis 

correction term is subtracted from the slope, the effect as u increases will be lor all the terms to decrease. 

Hence the adjusted value for the upper bound on the standard deviation of die traiisfonii values is just 

[[‘T^+(M-'',,)^l[(f\»',,)-f(Xi))/(*'i;x,)]2 + p*(f(t<, )-Km))2 
- -m)*(f(i', )-f(x^)/(i/j -Xj)]^i ^ 

substituting for v in die exact- a' formula. 

Similarly, we substitute for p to get an adjusted lower bound. Analogously, we handle curves with a 
positive second derivative by substituting p^ for for an upper bound, p^ for p for a lower bound. 
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10.5. Quasi-order statistics from the standard deviation 

If we know tlic mean and standard deviation of a set of data values, wc can use Chebyshev’s inequality to 
bound the number of items lying more than a certain distance from die mean. This information is like an 
order statistic, but since it only represents an upper bound on the number of items in a region and not an 
exact number of items, it must be used carefully. It can only be used for partitions of die interval of interest 
into two parts, the subinterval of points farther than a certain distance to die left (or right) of the mean, and a 
subinterval of all other points of the interval. It can also only be used for an upper bound on the mean of the 
transformed values, given /x, when c(x) has a maximum on the first subinterval that is more than the 
maximum on the second, or for a lower bound when c(x) has a minimum on the first subinterval that is less 
than the minimum on the second. 



Actually, Chebyshev’s inequality in tlic standard form (that only a fraction of the points of a 

distribution can lie greater than distance D units from the mean) is not the best inequality wc can get, since it 
refers to both tails of a distribution, and wc arc only concerned witli the number of points in one tail. Only 
a /(a +D ) points can lie to the left of a point D to tlic left of the mean, or lie to tlic right of a point D to the 
right of tlic mean. To sec diis, note tliat if fraction f of the points lie to the left of a point D units to tlic left of 
the mean, tlicn their weighted second moment about the mean is at least fD^ which must be less tlian a^. But 
in order for the mean to be at tlic place it is, this fraction f of tlic points must be compensated for by (1*0 
points R units to the other side of tlic iiican-. For maximal f, these other (1-0 points must all be at the same 
location, for otherwise they would have a nonzero variance which plus their mean would add to the variance 
of the whole distribution, and would require a lower maximum f. Hence we have tv/o equations to solve 
simultaneously: 

-t- (l-t)R^ = 
fl)-(l-0R = 0 

which imply 

R = Df/(l-0, fDV(l-0 = f = (tV(<j^-1-D^) 



Using this result, wc then can put bounds on the mean of the transformed values of 

upper bound: R/x) -t- ..5ff^f'(/x) 

+ (ctV(ct^-I- f)(c(x)) 

-t- (OV(a2+ l)2)*niax^- -^'^^^,(c(x)), 
provided the first max value is greater tlian the second 

lower bound: Rp.) 4- .5a^f '(/x) 

+ (aV(CT^+ DVmin (c(x)) 

+ (OV(CT^-f DViniii^.7)^,<v/c(x)), 
provided the first min valuels'Iess than the second 

These are the left-sided bounds; we can also get analogous expressions for bounds using points on the right of 
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a distribution. Unfortunately, wc cannot find optimal values of D for these formulas because they the 
derivative cannot be applied. 

Note that while it may be difficult to detennine for an arbitrary e(x) whether the maximum in one interval 
is greater tlian in another, the Taylor-serics quadratic approximation often always has this property for either 
the left-side or right-side rule. 

10.6. Evaluation of quasi-order statistics from the standard deviation 

Let us return to the analysis in section 5.3 of our standard example with the quadratic Taylor series 

approximation at jx. Choose as subintervals 10<x<33 and 33<x<100, so D = 33-23= 10= a. Since the 

error curve is monotonically increasing (c(m)<c(/ii)<e(M), and no c'(x) = 0 except ja) the maxima on the 

subintervals are at the rightmost points, and the minima at tlie leftmost. Hence tlic maxima are 

e(33) = 3.50-(3.14-l-.435-.106) = .03 and e(100) = 3.7. Similarly for the otlier bound, choose D=5, 10<x<18, 

and 18<x<100; and the minima are e(10) = -.12 and e(18) = 2.89-(3.14-.217-.023) = -.01. The maximum 

fraction f for x = 33 is 10^/(10^-fl0^) = .5, and for x = 18 is 10^/(10^-l-5^) = .8. Hence the revised bounds on 

the mean of the. transformed values are 

lower: 3.06 - .5*.03 - .5*3.7 = 1.20 
upper: 3.06 - .8*-.12 - .2*-.01 = 3.16 

which are better than tlie bounds obtained in section 5.3. 

D is a parameter here that can vary arbitrarily. Let us find the best value for it, for the case of a Taylor 
series approximation where c(x) increases with x, and a lower bound: 

0 = a/9D[(aV(a^-l-D^)) * c(/i-D) -I- (dMct^-I- D^)) * c(M)] 

0 = d/dD[[a^*c(n-D) + D^*c(M)]/(a^-l-iy)] 
soa2a/9D[f(/ii-D)-fl:/i)-Df{/i)-.5DV'(p)] + [2D * c(M)] 

= [a^[f(/i-D) - R/n) - Df(/i) - .5 DV'(/i) 1 + D^M)] * 2D 
Hence ff^[-r(/i-D)-f(/x)-DrV)] + [2D * e(M)l 
= [a2[f(jii-17) - f(|x) - Df (/i) - .5 DV'(/i)1 + dVm)] * 2D 
or2Dc(m)(l-D^) / 

= r(/i-D) -t- (l-2D2)r(/i) -I- D(l-iy)r(]u)) - 2Dffp D) - 2Df(ja) 
which wc can solve by iterative methods to find the best value of D. 

10.7. Splines and order statistics 

Wc have not referred to spline approximations in the preceding analysis because if an approximation curve 
is divided into pieces with different properties then wc must know how many data points arc in each to 
calculate means and shmdard deviations on tlic transformed values. One might think that for a given set of 
order statistics on a distribution wc may be able to create a spline approximation broken at the points at which 
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the order statistics are sited, and use that for bounding, but we still need to know moans of every subinterval, 
die knowledge discussed in section 9, which may be difficult to obtain. I'hus splines may be difficult to use. 

1 1 . Using fits to known distributions 

As a final kind of information which we might have about a set of values, we might know that their 

distribution is close to some well-known distribution, with a certain allowed tolerance. If the tolerance is 
\ 

small we can expect quite tight bounds on the transfonned values. Rut estimating statistics tlus way requires 
special preparation in advance (namely, measuring fits to a predicted distribution), and is not possible with 
most data presented in already-aggregated units. 

11.1. General formula for known distributions 

A well-known result (e.g. [3], section 7.3) gives tlie distribution of die transform of some probability 
distribution p(x), under Qie transformation function f\x), as 
q(y) = P(rHy)) * |dr\y)/dy| 

as a function of y, provided f is eiQier monotonically increasing of decreasing in the interval. 

So for instance if our p(x) approximates a uniform distribution on the interval m to M, q(y) = (l/(M-m)) * 
|df\y)/dy|. For f(x)=:ln(x), q(y) = e^7(M-m) on the interval y = ln(m) to y — ln(M); an estimate of the mean 
of q(y) is 

/yQ(y)tly / /q(y)dy = [(ln(M)-l)M - (ln(m)-l)m] / (M-m) — -I f [M lit(M) - m ln(m)]/(M-m) 

and an estimate of die second moment about zero is 

/y^q(y)dy / /q(y)dy = [M[ln(M)*ln(M) - 2 ln(M) + 2] - 
m[ln(m)*lii(m) - 2 ln(m) -I- 2]] / (M-m) 

which minus Uic square of llic estimate of the mean gives an estimate of the variance. 

For p(x) uniform, f(x)=l/x, q(y) = 1/y (M m) on the interval y=l/M to y=l/m; an estimate of the 
mean of q(y) is 

[ln(l/m)-In(l/M)]/(M-m) = ln(M/m)/(M-m) 

and an estimate of llie second moment about zero is (1/m - l/M)/(M-m) = 1/mM, hence an estimate of the 
variance is 

1/mM - [ln(M/m)/(M-m)]^ 
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1 1 .2. Handling inexact fits to distributions 

Wc have not addressed how to get hounds on means and standard deviations. We can do tltis by defining 
an "upper fit" and "lower fit" on die discrete set of n valtics such that 

tOy = maXj [Xj - gj], <0^ = minj [Xj - g^] 

where p(x)dx = (i-.5)/n, and p(x) is the distribution die Xj fit to 
In other words, the fits are the maximum and minimum deviations of an x^ from its value predicted by die 
approximating distribudon p(x). 

We can exploit die assumed fact that fix) is monotonically increasing or decreasing to say diat the 
maximum and minimum of die mean of the transfomied values occur when die x. are all at to or all at co^ 
from their predicted positions, not necessarily respectively. This is because less than an extreme deviation for 
one point cannot improve prospects for a more extreme mean; all point deviations are independent of one 
another, widiin the tolerances. Hence to find the extreme values of the transfomied mean one just calculates 
the means of 

Qu(y) = p[f'^(y)-<^u] * and 

Ql (y) = p[f'\y)-WLl * |df'ky)/dy| 

We can use this same approach to get bounds on the standard deviation in the manner of section 4.1. Wc 
just define a g(x) = [f(x)]^ as a new transformation function, and compute the above formulae widi g instead 
of f. We then compute bounds on the mean, square them, and subtract this interval from the interval 
computed on the mean of g(x). 

1 1 .3. Example of inexact distribution fit 

Suppose we know the distribution of x^ fits an even distribution on the interval 10 to 100, to such an extent 
that a point is never further dian 2 units in advance of where it would be in a perfectly even distribution, and 
never more tlian 3 units behind. Then the maximum-mean distribution is a uniform distribution from 12 to 
102, and the minimum-mean distribution is a uniform distribution from 7 to 97. Stippose we want to find the 
mean of the logarithms of these data values. Using the formulae we obtained in section 11.1, the mean of tlie 
first distribution is [102 ln(102) - 12 lii(12) - 102 -h 12]/ (102-12) = (472 - 29.8)/90 - 1 = 5.02- 1 = 4.02; and 
tlie mean of the second distribtition is [97 lii(97) - 7 ln(7) - 97 -I- 7] / (97-7) = (443 - 13.6)/90 - 1 = 4.78 - 1 = 
3.78. Hence the mean of die transformed values is between 3.78 and 4.02, corresponding to antilogs of 44 and 
,56. Note the mean of the original values must lie between (102 f 12)/2 = 57 and (97 + 7)/2 = 52. 

For an estimate of die standard deviation wc use the formula previously derived for an estimate of die sum 
of die squares, namely 
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[M[ln(M)*ln(M) - 2 ln(M) + 2] - m[ln(m)*ln(m) - 2 ln(m) + 2]J / (M-m) 

= [M(ln(M)-l)2-m(Ln(m)-])V(M-m) + 1 

For the uniform distribution 12 to 102, this is 

[102(3.62)^ - 12(1. 48) V90 + 1 = (1338-26.2)/90 + 1 = 15.61 

and for the uniform distribution 7 to 97 this is 

(97(3.57)2 - 7(.945)V90 ^ ^ ^ (1235-6.25)/90 + 1 = 14.58 

From tlic previous paragraph we know bounds on the mean of the transformed values are 3.78 and 4.02, 

hence bounds on die square of tlie mean are 14.3 and 16.2. Hence bounds on the variance are 15.61-14.3 = 1.3 

and max(14.58-16.2,0) = 0. Hence bounds on the standard deviation of tlie transformed values are 1.14 and 

0 . 

12. Small populations 

Thusfar we have not made use of the size of the data population being analyzed. This is only significant if 
the population is particularly small, in which case the known maximum M and minimum m (and Uie median 
and mode too, if known) arc a nonnegligible proportion of the points of tlic distribution. For instance, the 
linear bounds represent in general tlic two extreme cases where (a) all tlic points arc grouped at tlic mean, and 
(b) all tlic points arc at tlic maximum and tlie minimum. Knowledge of M and m tlius decreases the distance 
between linear bounds by a factor of 2/n, n the size of the data population, since it represents a weighted 
modification of case (a) by two points from case (b). 

1 3. Some experimental comparisons of the various bounds formulae 

We have am some simple experiments of tlic effectiveness of our bounds fonnulae on the mean of the 
transformed values. We wrote programs in INTERUSP-VAX. We used two test fuiiction.s, f(x)=ln(x) and 
f(x)= 1/x. For the expcrinicnts we computed upper and lower bounds derived tlie following ways: 

• simple linear bounds (section 3) 

• Taylor-scrics quadratic bounds, scries around the mean (section 5) 

• Tagrange-Chebyshev interpolation quadratic bounds (section 6) 

• For tlic reciprocal only, the one-sided quadratic bounds (section 7) 

• Order-statistic bounds from the Chebyshcv-incquality, using a Taylor series around the mean 
(section 10.5) 

• Best quadratic bounds found by explicit optimization on quadratic coefficients a and b (section 8): 

upper bound: a(a2+ju2) f bju + c + max^^^^j^Jf(x)-ax2-bx-c] 
lower bound: + bp + c + miii|^^^^^j^^[f(x)-ax2-bx-c] 
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Wc discovered tiiat our results for optimal bounds for the reciprocal curve were identical (except 
for roundoff error) to those for one-sided bounds, so we have omitted the former from the 
reciprocal table. Unfortunately, we have been unable to prove the connection (that is, tliat the 
one-sided bounds are indeed the optimal ones), tliough we strongly suspect it. 

Results are contained in figures 13-1 and 13-2. Since die closcd-fomi expressions are simple computations, in 
a computer implementation it is advisable to try all the different bounds methods, and take the minimum of 
the ppper bounds to get a cumulative upper bound, and the maximum of the lower bounds to get a 
cumulative lower bound. 

14. Application to correlated data 

An application of these ideas is to estimation of statistics of one attribute from those of another if the 
attributes are known to have a nonlinear correlation describable by a monotonic function such as we have 
been analyzing. We can then bound statistics on one attribute from statistics on the other. 

15. Direct optimization 

We should note tliere is another kind of optimization tliat can be applied to problems of this sort. We can 
make the optimization variables the values themselves of an unknown distribution and perform a constrained ' 
optimization with objective function the statistic on which bounds arc desired, and witli constraints tlie values 
of known other statistics. Conceptually, this is a nice approach since it can be applied to arbitrary states of 
prior knowledge and can bound arbitrary statistics. 

Wc have done a number of experiments which wc do not have the space here to discuss, and the idea seems 
to work. However, wc have found that this "direct optimization" is highly sensitive to optimization methods, 
starting points, and step sizes, and is surprisingly difficult to get convergence for; unlike quadratic 
optimization, the function optimized is not usually convex. But there is an even more serious problem with 
direct optimization, a very fundamental one; it only gives lower bounds on upper bounds, and upper bounds 
on lower bounds, unlike all the other bounds discused in this paper which arc upper bounds on upper 
bounds, and lower bounds on lower bounds. For instance, for our standard example wc found a lower bound 
on tlic upper bound of 3.09771 on the mean of llic logarithms from direct optimization, but wc have iio idea 
how much larger a bound is possible up to tltc quadratic-optimization bound of 3.10383 which represents an 
absolute limit. Thus the utility of direct optimization is questionable in bopunded statistical estimation,, and 
wc do not sec it as a challenge to the methods developed in this paper. (It docs provide a useful tool for 
debugging the methods, however, since for instance any supposed bound wc find less than tlic upper bound 
on the lower bound is in error.) 
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Figure 13-1: 



Some comparisoiib between eiifrcrent expressions 
for bounds on die mean, for f(x)-ln(x) 
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For these results the quadratic optimum v/as veri'icd to be equal to the one- 
sided bound when allowing for roundofl' error. 



Figure 13‘2: Some comparisons bctv^ccn different expressions 
for bounds on the mean, for l^x)- 1/x 
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1 6 . Conclusion 

We have developed some quick closed-form expressions for bounds on the mean and standard deviation of 
a finite set of transformed numerical data values, where the transfonuation function has derivatives of 
constant sign in the interval of interest. In making these estimates we use only statistics on tlie original set of 
data values, and no actual values tliemselves. Our bounds provide a useful alternative to often difficult-to- 
obtain confidence intervals, requiring no distributional assumptions whatsoever. Such bounds arc likely to be 
helpful for exploratory data analysis as an aid to getting a feel for the data, preliminary to detailed hypotliesis 
testing. 
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